Movies Exploratory Data Analysis

Using both dataprep.eda and matplotlib to create basic plots of our fields.


For Matplotlib

After importing pandas, add this import line:

import matplotlib.pyplot as plt


For DataPrep.eda

First install dataprep using the command prompt or terminal

(Remember, if you are using an environment to install it within your environment.)

pip install dataprep

Then add this import line:

from dataprep.eda import plot, plot_correlation

For creating plots for exploratory data analysis, see the documentation here:

To learn more about the DataPrep.eda library:

Imports

Read and Review Data

Overview Distributions of All Fields

Time Series Analysis of release_date

Univariate Analysis of Numeric Fields

Univariate Analysis of budget

Evaluation: Skewed to the right by a small number of very high budgets.

We have a lot of outliers with extraordinarily high budgets ...

Let's do the math to see what number defines the upper whisker, beyond which movie budgets are considered outliers.

Outliers are those whose values are greater than 1.5 IQR above the 75th percentile.

Outliers are those with budget of $91M or greater

Univariate analysis of revenue

Univariate Analysis of Categorical / Textual Fields

Analysis of genre

Bivariate Analysis

Budget and Revenue

Budget and Runtime

Revenue and Runtime

Create Calculated Fields

and organize for more efficient analysis

  1. Add profit and ratio columns
  2. Rename genre1 to genre
  3. Reorder columns and drop genres column
  4. Set title as the index

Create profit Column

Create ratio column

Rename genre1 to genre (singular) and drop genres

Reorganize Columns

Set title as the dataframe index

EDA for Calculated Fields

Univariate Analysis of profit

Univariate Analysis of ratio

Interpretation and Questions

Investigate Ratios (Revenue/Budget)

Evaluation

Filter for movies with budgets GTE $50K

Movies with $50K+ Budgets Sorted by Revenue/Budget Ratio

Averages by Genre

Pivot table and charts to show per genre:

Sort by ratio and format output

Plot AVG Budget by Genre

Plot AVG Revenue by Genre

Plot AVG Profit by Genre

Plot AVG Ratio by Genre